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Abstract 

A geometric analysis of protein folding, which compliments many of the models 

in the literature, is presented. We examine the process from unfolded strand to the 
point where the strand becomes self-interacting. A central question is how it is possible 
that so many initial configurations proceed to fold to a unique final configuration. We 
put energy and dynamical considerations temporarily aside and focus upon the geom- 
etry alone. We parameterize the structure of an idealized protein using the concept 
of a ribbon from differential geometry. The deformation of the ribbon is described by 
introducing a generic twisting Ansatz. The folding process in this picture entails a 
change in shape guided by the local amino acid geometry. The theory is reparamater- 
ization invariant from the start, so the final shape is independent of folding time. We 
develop differential equations for the changing shape. For some parameter ranges, a 
sine-Gordon torsion soliton is found. This purely geometric waveform, has properties 
similar to dynamical solitons. Namely: A threshold distortion of the molecule is re- 
quired to initiate the soliton, after which, small additional distortions do not change 
the waveform. In this analysis, the soliton twists the molecule until bonds form. The 
analysis reveals a quantitative relationship between the geometry of the amino acids 
and the folded form. 

1 Introduction 

In the more than half century, since it was established that the amino acid sequence of a 
protein molecule determines the unique folded configuration [1] and that unfolded proteins 
re-fold from some range of initial conditions to the same end state, a wide range of physics 
and geometry based models have been intensively studied. Recent reviews can be found 
in [2], [3], and [4] and there are many books that deal with the subject [5], [6], [7]. The 
fact that proteins fold spontaneously from a range of initial configurations to a unique end 
state, in spite of the small energy available is astonishing. In particular, there are many 
forces involved [2] and each of these is related to a change in shape through some non-linear, 
non-local, (and possibly temperature dependent) material response tensor. 

Particularly challenging to our understanding is the initial phase, which ends with the 
molecule becoming self-interacting. The insensitivity to variation of initial conditions, immu- 
nity to noise, and to ambient conditions, leads us to conjecture that there is some essentially 
geometric aspect of folding that guides this initial stage. In this paper, we put the dynamics, 
such as [8], and energy landscapes [3] temporarily aside and analyze the geometry of fold- 
ing using differential geometry, which is the natural mathematical language for describing 
shapes and changes of shape. Our analysis leads to a set of differential equations which are 
potentially useful to model-builders. For a limited range of parameters, we solve the equa- 
tions analytically and find that a torsion-wave soliton emerges. This soliton lives in a space 
of possible molecular shapes, wherein it describes a twisting deformation, which ultimately 
stops when the molecule becomes self-interacting. This soliton is different from the various 
dynamical solitons that have appeared in the literature for some time, e.g. [9], but it has 



2 



similar characteristics; namely, a threshold for formation, a stability against noise, hierarchy 
of forms, and a non-linear superposition principle. 

Our starting point is a construct known as a ribbon, which is a pair of writhing space 
curves. One space curve, the base curve, aligns with an average backbone of the molecule. 
The other curve, the neighboring curve, carries information about the local (amino acid) 
geometry, especially the location of the side-chains, and writhes about the base curve. (The 
ribbon has been previously applied to double-stranded DNA [10] and is compatible with 
the coils and kinks seen in protein molecules.) We introduce a deformation Ansatz, which 
is a generic response to any torque. We have constructed the Ansatz so that the theory 
is reparameterization invariant; the folding is the same no matter how quickly or slowly it 
happens. Differential geometry then leads us to a set of differential equations, which will be 
discussed in some generality elsewhere. In this paper we focus upon a sub-set, which arises 
by making specific, empirically motivated, assumptions about the parameters, and which 
leads to the soliton. We discuss the solutions to the soliton and discuss how they can be 
accommodated to a structure which is segmented, not continuous. One surprising property 
of the simplest, antikink, solution is that the more planar the initial shape of the of the 
unfolded molecule, the faster the molecule folds. This property may not generahze to other 
solutions. 

Before presenting our calculations in the next section, we conclude this section by re- 
marking that it would be natural to combine our results with models based upon torsion 
angle [8] and/or energy landscape. If the geometrical features described here also appear in 
models with dynamics, then a step will have been taken toward understanding insensitivity 
of folding to initial conditions, to noise, and to environmental factors. We also remark that 
the reparameterization invariance in our analysis are also encouraging. 

2 Ribbons and their deformations 

In this section we discuss ribbons and a certain kind of deformation associated to them. A 
ribbon can be viewed as a space curve with a field of planes tangent to the curve. It is 
reasonable for a discussion of ribbons to start with a discussion of the difi^erential geometry 
of space curves. Thus let x : [0, L] E"^ be an embedded, i.e. one-to-one, curve in Euclidean 
3-spacc given as a function of arc length s, where L is the length of the curve x. We suppose 
that X has enough differentiability so that all that we discuss exists. We may define the unit 
tangent vector field 

— * — * 

6i — Xg, 

where the subscript denotes differentiation with respect to ,s". If we assume that the curve x is 
non-degenerate, i.e., that Xg and Xgg are linearly independent at all points of the curve, then 
we can complete the vector field ci to the Frenet frame ei, 62, 63, where 62 is the principal 
normal and 63 is the binormal. This moving frame satisfies the well-known Frenet-Serret 



3 



equations. 



dei = Ke2 ds 

de2 = {—KCi + Tez)ds 

de3 — — re2 ds 

The functions k and r give the curvature and torsion of the space curve x, respectively. 

For later use, we wish to point out that we may view ei as a curve taking values in S^, 
the unit sphere centered at the origin of E^. If we represent ci as a function of its arc length 
a, then one may write el : [0,K] — > S^, where K — Kds is the length of Ci. It follows 
from the definition of the curvature k that 

da 

It is straightforward that the geodesic curvature k of Ci is given by 

To obtain a ribbon from a space curve wc need to associate to the space curve a field of 
planes tangent to the space curve. Since at each point of the space curve, the vector ei lies 
in the plane tangent at that point, the plane is completely determined by giving a vector 
P that is perpendicular to ei that lies in the tangent plane. Necessarily P is in the plane 
spanned by 62 and 63. Thus we introduce the function : [0, K] ^ M. by requiring that the 
following hold: 

z7 = cos-?/' 62 + sin-?/' 63. (3) 

The space curve x wc introduced represents the base curve of our model. The neighboring 
curve in our model is represented by a; + /e2, where / is a positive function of the arc length 
s. Our primary interest in this section lies in how this ribbon deforms under a twisting 
operation on the base curve which depends upon the neighboring curve at every point. More 
specifically we are interested in an essentially adiabatic, reparameterization invariant change 
in the shape of the base curve. To this end, we parameterize the variation of the ribbon by 
means of a parameter u. Thus all quantities under consideration become functions of s and 
u. For example, the torsion t(s) becomes t{s,u). In this section, we examine the variation 
in the geometry of the ribbon as u ranges over some domain by means differential equations 
in the geometric invariants mentioned in the preceding paragraphs. 

Our ansatz is the following equation: 

— = 7(m)/(s)z7(m,s). 

The coefficient ^{u) is a positive function chosen for later convenience. Moreover, it is a 
function of u so that the form of the equation is invariant under changes of the parameter u. 
We consider any parameter which is a continuously differentiable function of u with positive 
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derivative as an admissible parameter for representing the variation. Since u will change over 
some interval, time dependence enters indirectly through this parameter. We emphasize that 
in this section we are studying changes in shape under a twisting deformation, not dynamics. 

Given what has just been said we may as well choose a parameter u for which 7 = 1. 
Thus we study the variation in the ribbon induced by the following differential equation: 

^-^{s,u) = f{s)v{s,u). (4) 

We study this variation under the following assumptions: If we view the base curve 
as a polygon with atoms at the vertices, then the lengths of the segments and the angles 
between successive segments will remain constant under the variation. In our model which 
is differentiable this corresponds to the following: 

1. The element of arc length ds is invariant during the variation. 

2. The curvature n is invariant during the variation. 

It follows from these assumptions and equation ([T]) that the element of arc da remains 
unchanged during the variation. 

It is well-known that the shape of the base curve x is completely determined by k and 
r given as functions of the arc length s. Since k does not depend on m, we regard it as a 
known function. Thus our goal is to determine how r depends on u. Since r = kn, we can 
just as well determine how k depends on u. Finally, our ansatz can be viewed as defining a 
variation of the curve el. Thus we transfer the variational problem to sphere and study 
how Ci varies under our ansatz which amounts to studying how k varies under our ansatz. 

In what follows we use the "method of moving frames" and differential forms to make 
our calculations. The reader may want use a text by [11] as a reference for what is done 
below. 

We summarize all we know about el and the variation in the following equations, where 
V represents a quantity to be determined. These equations follow from the Frenet-Serret 
equations and equations ([T]), (|2]), (|3]) and Q. 

dei = 62 da + {f cos ip 62 + f sinip e^) du (5) 
de2 = {— ei + k 63) da + {—f cos ei + V 63) du (6) 
de^ = —ke2da + {—f sinip ei — v 62) du (7) 

We compute the exterior derivatives of the above equations and use the fact that d^Cj = 0. 
From the 6*2 and 63 components of d'^e'i and 63 component of (i^e2, we get the following 
equations. 

k = -^p^ + jcot^p (8) 

V = faCSCIp (9) 

= -ku + w^ + fsinip (10) 
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Using equations ([s]) and (|9|, we substitute for k and w in equation (10) to obtain the 
following second order p.d.e. in ip. 

[i'a - -J cot tlj]u + [fa CSC tfj]^ + f simfj = (11) 

If we make the further assumption that / is constant, this equation becomes 

ilJau + fsm^ = 0, (12) 

the sine-Gordon equation. 

We consider the implications of this equation given that it has soliton solutions. If the 
base curve is initially fairly planar, i.e., its torsion (or equivalently k) is not too far from zero 
along its entire length, we consider how a solition might explain the folding which is always 
observed. Note, for later use, if k is close to zero over the entire length of the base curve, it 
follows from equation ([s]) (assuming / is constant), that ip is close to being constant. 

We base our arguments on antikinks which for us take the form 



ip{a,u) = 4: tan ^ exp(^^/f{au \- b)^ 



(13) 



where a > and b are constants. 



We need to recall that the partial differential equation (11), and hence (12), is defined 



on the domain [0,K] x [0,f/], where U may be some positive real or oo. Thus one must 
consider either equation as part of an initial value problem on that domain. Since the curves 
a = constant and u = constant are the characteristic curves of either partial differential 
equation, we need to consider our problem as a characteristic initial value problem. Thus it 
is natural to suppose that the values of ip are given on the curves u = and a = 0. 
Then we must accept as known that 



V'(o-, 0) = 4 tan" 



exp (y7(-^ + 6)y 



Given that initially we assume the base curve is fairly planar, the function ip is close to being 
constant function on [0,K]. We choose the constant b so that 4tan~^ [exp (v7(^))] approxi- 
mates that constant value and a very large so that 4tan~^ [exp (v7(^a + b))] approximates 
that constant value on the interval [0,iir]. 
We must also accept as known that 



ip{0,u) = 4 tan" 



exp 



f{au + b) 



If a vibration of the left end point of the base curve be can represented by this function, then 



the antikink given by equation ( 13 ) describes the subsequent motion of the base curve in the 



following fashion. As the u increases in value, the antikink moves along the base curve and 
simultaneously, due to equation ([s]) a "bump" of geodesic curvature, which corresponds to 
a "bump" of torsion moves along the base curve. The effect of this "bump" of torsion is to 
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twist the base curve into a position where bonds are sure to be formed. At this point, our 
model is no longer viable and the "bump" leaves in its wake the twisted form of the base 
chain of the protein. 

Should the initial values of if) be different and the vibrations of the left end point be of a 
different form as well, one can presume that there are other solitons which satisfy these initial 
conditions and thus ultimately produce a twist that moves along the base curve leading to 
the formation a stable twisted molecule. 

The process just described for antikinks can, in fact, lead to similar conclusions if one 
assumes that / is a piecewise constant function, rather then a constant function on [0,i^]. 
Let's suppose that / takes the value /i and /2 on the subintervals [0, cti] and [ai, (J2] of [0, K], 



respectively. To construct a continuous, piecewise differentiable solution of equation ( 12 ) we 
need the following to be true, for all m in [0, [/]: 

aiu - — + &i ) = a/^ ( a2U - — + 62 
If we regard ai and 61 as known, we can clearly choose 02 and 62 for this to be true. Thus if 



again assume that Tp is fairly constant on [0, K], one can still find solutions of equation (12) 
that give rise to a moving "bump" of torsion along the base curve. 

Our parameter u does not represent time but must be a monotonically increasing function 
of time. Even though we are not dealing with dynamics, if we want to bring time into our 
considerations then our ansatz becomes 

^(.,t)=7(t)/(.)z7(s,t). 



for some positive real-valued function '~){t). One easily finds that equation (11) becomes 



fa 

[^a - y cot^jt + 7[/<^csc^]^ + 7/sinV' = 
and the sine-Gordon equation takes the form 

i^at + T/sinV' = 0. 
The formula for an antikink becomes 

^/'((T,t) = 4tan~^ exp [^/f{ag{t) - ^ + ^)) 

where ^ = 7 and g{Q) = 0. 

If we suppose g is fairly constant and depends primarily the upon medium in which the 
protein is found (as opposed to depending upon the protein, itself) then we can argue as 
follows. The more planar the initial shape of the base curve, the more closely to being 
constant the initial values of ijj are. Hence, the larger the value of a must be so that the 
antikink approximates well those initial values. However, the larger the value of a, the faster 
the soliton moves along the base curve, and hence the faster the "bump" of torsion moves 
along the base curve and consequently the faster the formation of the twisted molecule. 
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3 Conclusions 



In this paper we have presented the results of a purely geometric analysis of the protein 
folding process. Our most important results are as follows. 

i. We parameterized a course-grained model of a protein molecule, which quantitatively 
describes the shape only of the molecule; the description is independent of position 
and orientation. The parameterization includes the backbone and a distillation of the 
geometry of the side-chains that we refer to as the neighboring curve. We call this 
model a ribbon. 

This parameterization is particularly useful for geometrical and structural studies, 
providing that the coarse-graining is appropriate. The ribbon is introduced to allow 
for analytic studies using differential geometry. 

ii. We extended this parameterization to include changes of shape. This parameterization 
that, again, depends only upon shape, not upon position, orientation, or space-time 
motion, defines a curve, i.e., a trajectory, of possible protein shapes. We constrained 
the possible shapes in some ways that are appropriate by fixing length and bending. 
Changes in the twisting of the protein shapes is allowed and reflects the presence of 
amino acids through the idealized neighboring curve acting on the backbone. 

A molecule in this description follows some trajectory from unfolded to partly folded, 
where chemical bonds form. 

Our formulation is reparameterization invariant from the outset, therefore the folding 
geometry in independent of wall-clock time. 

iii. Most importantly, we used the above parameterization to study possible trajectories 
in the case of an ad hoc by not unreasonable constraint on the shape. We found 
a trajectory associated to a soliton solution of the sine-Gordon equation, producing 
what one might call a torsion soliton. The soliton produces a torsional distortion of 
the molecule. The distortion of the molecule is fixed, because of bond formation, during 
propagation of the soliton. 

This soliton, which arises from geometric relationships within the folding molecule, is 
geometrical and thus different from dynamical solitons that are well known in protein 
science. However, it has similar properties. 

a. Its stability, i.e., the fact that propagates without continuous input of energy, is 
indifferent to scattering, temperature or forces that may vary from cell to cell, 
may explain how it is possible that the folding is unique in spite of the variety of 
forces, response tensors, and environmental conditions involved. 

b. If this soliton occurs in nature, it may also explain the other puzzles that were 
raised in the Introduction; a threshold distortion of the molecule is required to 
establish the soliton, but the sensitivity to initial conditions is minimal. 
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c. The soliton describes a self-focusing torsional wave. Since energy was temporarily 

put aside at the outset, the energetics of the soliton remains unspecified here. Ob- 
viously, the soliton will be relevant only if it is energetically allowed, but the fact 
that it arises from geometry alone suggests that a relatively fiat energy landscape 
might suffice. 
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